Existence of Saddle Points in Discrete Markov Games and Its 
Application in Numerical Methods for Stochastic Differential Games 
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Abstract — This work establishes sufficient conditions for 
existence of saddle points in discrete Markov games. The result 
reveals the relation between dynamic games and static games 
using dynamic programming equations. This result enables us 
to prove existence of saddle points of non-separable stochastic 
differential games of regime-switching diffusions under appro- 
priate conditions. 

I. Introduction 

The merge of differential games and regime-switching 
models stems from a wide range of applications in com- 
munication networks, complex systems, and financial engi- 
neering. Many problems arising in, for example, pursuit- 
evasion games, queueing systems in heavy traffic, risk- 
sensitive control, and constrained optimization problems, can 
be formulated as two-player stochastic differential games [1], 
[2], [3]. In another direction, recent applications for better 
describing the random environment leads to the use of the 
so-called regime-switching models; see [8], [11], [14], [19], 
[20] and many references therein. Since for many problems 
arising in applications, closed-form solutions are difficult to 
obtain. As a viable alternative, one is contended with numer- 
ical approximations [10], [12], [15]. A systematic approach 
of numerical approximation for stochastic differential games 
was provided in [6] using Markov chain approximation 
methods. The major difficulty in dealing with such game 
problems is to prove the existence of the value of the game. 
To ensure the existence of saddle points, separability with 
respect to controls for objective function and the drift of the 
diffusion is required in [6], It would be nice to be able to 
relax the separability condition. 

Markov chain approximations of stochastic differential 
games are indeed discrete Markov games. In this paper, 
we aim to develop sufficient conditions for the existence 
of saddle point of discrete Markov games. In the proof, 
we start with dynamic programming equation together with 
static game results obtained by Sion [13] and von Neumann 
[9], discover the relations between static games and dynamic 
games by a series of inequalities. This approach enables us 
to treat non-separable discrete Markov games with respect 
to controls. By virtue of results in discrete Markov games, 
we can easily prove the existence of saddle points of dis- 
crete Markov games arising in numerical approximations of 
stochastic differential games when a discretization parameter 
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h is used. As h — > 0, we are able to obtain the existence of 
saddle points of non-separable stochastic differential games 
using weak convergence techniques in [7] and [6]. 

The rest of the paper is arranged as follows. Section II 
begins with the formulation of the discrete Markov games. 
Section III presents sufficient conditions for the existence of 
saddle points of discrete Markov games for both ordinary 
control and relaxed control spaces, respectively. Section IV 
applies the results in the discrete Markov games to stochastic 
differential games. Section V concludes the paper with 
further remarks. 

II. Formulation 

Consider a two-player discrete Markov zero-sum game. 
Let S be a finite state space of a Markov chain, and dS C S 
be a collection of absorbing states. Control space U-y and 
U2 for player 1 and player 2 are compact subsets of R. 
[For notational simplicity, we have chosen to treat real- 
valued controls in this paper.] Let {£„,n < 00} be a con- 
trolled discrete-time Markov chain, whose time-independent 
transition probabilities controlled by a pair of sequences 

{(ui, n ,U2,n),n < 00} is 

p(x,y\ri,r 2 ) = P{£ n +i = y\U = x,ui, n = ri,u 2 , n = r 2 }, 

(1) 

where tij,„ 6 Ui denote the decision at time n by player i. 

Definition 2.1 : A control policy {(ui >n , U2, n ), n < 00} 
for the chain {£„, n < 00} is admissible if 

P{£,n+i = y\Ck,ui,k,U2,k, k<n}= p(£ n , y\v,i, n , it 2 . „)■ 

(2) 

If there is a function itj(-) such that Uj „ = men we 

refer to Uj(-) as a feedback control of player i. 

Given the running cost function c(-, •, •) : S x U\ x U2 *— > 
M+U{0}, and the terminal cost function g(-):S , ^M+U{0}, 
the cost for an initial £n = x 6 S and an admissible control 
policy (u\,U2) — {(ui >n , U2, n ) ■ n < 00} is defined by 

JV-l 

W(x,u 1: u 2 ) = E^[J2 c(£ Wl ui, n ,u 2 ,„)+s(&v)], (3) 

n=0 

where N = min{?i : £ n 6 dS} and E^ 1 : " 2 is the expectation 
given that initial £0 = x and control (u\, 112). 

In the discrete Markov game, player 1 wants to minimize 
the cost, while player 2 wants to maximize. The two play- 
ers have different information available depending on who 
makes the decision first (or who "goes first"). Using Ui(l) 
to denote the space of the admissible ordinary controls that 
player i goes first. That is, for m € Ui{l), there exists a 



sequence of measurable functions F n (-) taking values in Ui 
such that Ui >n — F n (£k,k < n ', u i,k, u 2 ,k, k < n). Similarly, 
using Ui(2) to denote the collection of the admissible or- 
dinary controls that player i goes last, that is, u, G Ui(2) 
is determined by a sequence of measurable functions F n (-) 
taking values in U{ such that = F n (£fc, k < n; u^h, k < 
n;u j:k ,k <n,j ^ i). 

To proceed, we define upper and lower values by 



V+(x) 



min max W(x,ui,u 2 ) (4) 

uiGWi(l) u 2 GW 2 (2) 



V (x) = max min W(x, u±, 1*2), (5) 

tt 2 GW 2 (l) mGWi(2) 

respectively. It is obvious V~(x) < V + (x) for Mx G S. If 
the lower value and upper value are equal, then we say there 
exists a saddle point for the game, and its value is 

V(x) = V + (x) = V~(x), VieS. (6) 

The corresponding dynamic programming equation is 

V+{x)= min ma X {E x [V+(Ci)]+c(x,n,r 2 )}, (7) 

riEUi r 2 £U 2 

V-(x)=max mm{E x [V-(Ci)]+c(x,r 1 ,r 2 )}. (8) 

r 2 G(7 2 riGd 

Practically, we can find V + and V~ in @ and Q by solving 
and (jSJl using iterations. This is possible owing to the 
following lemma. The proof of this lemma can be found in 
[4, Lemma 2], and a weaker form in [18]. 

Lemma 2.2: < 00} is Markov chain with state 

space S, absorbing states dS, and transition probability 
p(x, y\ri, r 2 ). Let there be a real number 7 > with 

P(£n G dS\£ = x, ui, fe , w 2 ,fc, k < n) > 7, Vx G S 1 , (9) 

c(x, r-y,r 2 ) is continuous in ri and r 2 , To each admissible 
control, (u±,u 2 ), the cost W(a;,t£i,ti2) is defined by 0. 
Then W(x, Ui,U 2 ) is finite and solutions of @ and (jSJl are 
unique. For any initial value {Vq (x) : x G 5}, the sequence 

K+ +1 (x) = min m^{E x [V+(^)} + c(x,n,r 2 )} (10) 

r 1 £ (7i r 2 G (7 2 

converges to y + (x), the unique solution of as n-> 00. 
Analogously, for any initial {V^~(x),x 6 S*}, the sequence 



max mm 



{E x [V-(^)}+c(x )ri) r 2 )} (11) 



r 2 G(7 2 riGf/i 

converges to V~(x), the unique solution of © as n — > 00. 

III. Existence of Saddle Points 

In this section, we provide sufficient conditions for the 
existence of saddle points in discrete Markov games. An 
existence proof is established through a series of inequalities. 
In addition, the definition of relaxed controls is given as a 
generalization of ordinary controls. It is shown that saddle 
points always exist in relaxed control space. 

Definition 3.1 : /(r 1; r 2 ) is said to be convex-concave 
with respect to (r 1 ,r 2 ), if f(-,r 2 ) is convex and f(ri,-) 
is concave. 

Next, we present a well-known minimax principle in static 
games, which was obtained by Sion in [13]. 



Lemma 3.2: Let Mi and M 2 be compact spaces, tfi(-, •) 
be a convex-concave function on Mi x M 2 , then 

min max cf)(ri,r 2 ) = max min </>(n, r 2 ). 

riGA/i r 2 GA/ 2 r 2 GM 2 riGMi 

One of following two assumptions are needed for the 
existence theorem. 

(HI) p(x, y\r±, r 2 ) and c(x, r±,r 2 ) are continuous and 
separable in r\ and r 2 . 

(H2) p(x, y\r±, r 2 ) and c{x,r\,r 2 ) are convex-concave 
with respect to (ri,r 2 ). 

Theorem 3.3: Assume either (HI) or (H2). {£n,n < 00} 
is a Markov chain as in Lemma l2~2l Let V + (x) and V~(x) 
be associated upper and lower values defined in @ and 
Then there exists a saddle points, that is, 

V + (x) = V-(x), VseS. 
Proof. Define two functions </> + (-) and cf>~(-) by 

(j) + {x,r 1 ,r 2 ) = ^2p(x,y\r 1 ,r 2 )V + (y) +c(x,r 1 ,r 2 ), 
yes 

_ (x,ri,r 2 ) = ^2p(x,y\n,r 2 )V~ (y) + c(x,n,r 2 ). 
y&S 

The dynamic programming equation of @ and (|8jl can be 
rewritten as 

V + {x) = min max {(f> + (x, ri, r 2 )}, 

ri Cf-'i r 2 Gf/ 2 

V~~(x) — max min {</>~(x, ri, r 2 )}. 

r 2 GC/ 2 r\ (_(/: 

Under either assumption (HI) or (H2), by Lemma l3~2l 

min max (f> + (x, r%, r 2 ) — max min <fi + (x, r\, r 2 ). 

riGfi r 2 Gf 2 r 2 G C 2 ri G t/i 

(12) 

Let p = m&x xeS {V + (x) - V~(x)} > 0, then 

V + {x) < V~(x) +p, VxeS. (13) 
In particular, there exists x G S, so that equal holds in Jl 31 . 

V+(x) = V - (&) + p. (14) 
For x given in dl4t . a series of inequalities follows, 

V + (x) = min max {</> + (x, ri, r 2 )} 

riGd r 2 Gf 2 

= max min ri, r 2 )} 

r 2 £U2 rx&Ui 

= max min { > ?/|ri, r 2 )V + (?/) 

yGS 

+c(x,ri,r 2 )} 

< max min { V" y|ri, r 2 )(V r_ (y) + p) 
r 2 eu 2 r 1 eu 1 * 

yGS 



+c(x,ri,r 2 )} 



= max min (x, ri, r 2 )} + p 

r 2 G(7 2 riGfl 

= V _ (x) + p. 



(15) 



By virtue of (I14i . we conclude all inequalities are indeed 
equal in d!5i . and this implies 

V+(y) = V-(y) + p, VyeS. 

Note that V + {x) = V~{x) for all x G 95. Hence p = 0. 
The existence of the saddle point is established. □ 

The above theorem gives sufficient conditions for the 
existence of saddle points. We note that there always exist 
saddle points in relaxed control space with merely continuity 
assumed. 

Definition 3.4: A control policy {(mi, n , m2, n )> n < oo} 
for the chain {£ n ,n < oo} is said to be a relaxed control 
policy, if mi >n is a probability measure on S(C/,-), a cr-algebra 
of Borel subsets of [/,. 

More general definition of relaxed control is given by 
Definition ^. ll in the context of stochastic differential games. 
Let V(U\) and V(U 2 ) be collection of probability measure 
on B(U\) and B(U 2 ). Slightly abusing notations, we gener- 
alize real function /(•,•) on U\ x U 2 into a function / on 
V{Ui) xV(U 2 ) as following 

/(Mi,M2)=/ / .f(ri,r 2 )fi 1 {dri)fi 2 (dr 2 ). 
JUi Ju 2 

Using the notation of relaxed control representation, the 
transition probability function is 



p(x,y\fj,i,fj, 2 ) = / p(x,y\r 1 ,r 2 )ni(dri)fj,2{dr 2 ), 
Ju x Ju 2 

and the cost under the relaxed control policy (mi, 7712) = 
{(mi, n ,m2,„),n < 00} is 

N-l 

W(x,m 1 ,m 2 ) = E™i' m >lJ2 cfe,m lin ,m 2 , n ) + g(£ N )]. 

Using I\(l) to denote the space of admissible relaxed 
controls that player i goes first. That is, for rrii G 1^(1), 
there exists a sequence of measurable function H n (-) taking 
values in V(Ui) such that 

nii,n = H n (i k ,k < n, mi,fe, m 2 ,k,k < n). 

Analogously, using Tj(2) to denote the space of admissible 
relaxed controls that player i goes last. That is, G 1^(2), 
there exists a sequence of measurable function H n (-) taking 
values in V(Ui) such that 

mi,n = H n {^ kl k < n;m 1>k ,k < n\m^ kl k <n,j ^ i). 

The upper and lower values associated with relaxed control 
space are defined by 

V™(%) = mm max W(x,mi,m 2 ) (16) 
mi er 1 (i)m 2 er 2 (2) 

V~(x)= max min Wix.mx.m?), (17) 
m 2 er 2 (i)mier 1 (2) 

respectively. To proceed, we present another static game 
result obtained by von Neumann [9]. 



Lemma 3.5: Let M\ and M 2 be finite sets. Let <p(-,-) be 
a function on M% x M 2 , /ii G V{M\) and \i 2 G V(M 2 ) be 
probability measure on Mi and M2, then 

min max </>(^ii,/i 2 ) = 

max min </>(/ii, /i 2 ). 
ii 2 eV(M 2 ) ^iev(Mi) 

Theorem 3.6: {£, n ,n < 00} is a Markov chain as in 
Lemma I2T2I with relaxed control used. Assume p(x,y\-,-) 
and c(av,-) are continuous on U\ x [7 2 . Let V+(x) and 
V^"(a;) be associated upper and lower values of dl6l and 
(I17> . Then there always exists a saddle point, that is 

V+(x) = V-(x), \/x G 5. 

Proof. Define two functions 0+ (•) and by 

yes 

<p^(x,ni,p, 2 ) = ^2p(x,y\fJLi,fi2)V~(y) + c(x,fn,fi,2). 
yes 

Then dynamic programming equation in relaxed control 
space can be written by 

Vm( x )= min s max {0+(a;,Mi,/J2)}, 

V~{x)= max min {0 m (a;,^i,/j 2 )}, 
^ 2 ev(u 2 ) mev(U!) 

Note that c(x, •, •) is continuous in compact set XJ\ x C/ 2 . 
Hence for Ve > 0, there exists a finite subset [/[ x [/ 2 e C 
U\ x U 2 , such that 



min max c(x,Ui,u 2 ) 
merfu-L) fi 2 ev(u 2 ) 

— min max c(x, i£.a%) 
max min c(x,Ui,u, 2 ) 



max min c(x, uf,tt§ 
p|eP(t/|)^67'(t/f) ^ 



< £. 



< £. 



(19) 



(20) 



Forcing to the limit as e — > in ( II 91 and (I20> . as well as 
using Lemma |3~51 we have 



min max c(x,ui,u 2 ) = 
in&ViUi) fj, 2 ev(u 2 ) 

max min c(x, U1.U2). 
Similarly, we obtain equality for function p(x, y\- 
min max p(x. ylui, U2) = 

MieP((7i)p2GP((72) 

max min p(x,y\^i, p 2 ). 

V 2 eV(U 2 ) pi£P((7i) 

Equalities in (12 It and J22I implies 

min max (b^.(x, ui, u 2 ) = 

max min 6t.(x, ui, u 2 ). 



(21) 



(22) 



(23) 



The rest of this proof is similar to the lines of inequalities 
(1151 . The details are omitted. □ 



IV. Numerical Methods Regime-Switching 
Stochastic Differential Games 

In this section, we formulate stochastic differential games 
with regime switching. Numerical methods using Markov 
chain approximation leads to a sequence of discrete Markov 
games discussed in the previous section. The use of Theo- 
rern B. 31 gives sufficient conditions for the existence of saddle 
points, and facilitates the proof. 

A. Formulation 

Consider a two-player stochastic game of regime- 
switching diffusions. For a finite set A4 = {!,..., m }, 



x e R l °, b(; v ) : R ! " x M x R x 



x(t) = x(0) 



■\ <7( V ,0 



°, the dynamic system is given by 

t 

b(x(s), a(s), Ui(s), U2{s))ds 

t (24) 
a{x{s),a{s))dw{s), 

o 

where for each i — 1,2, ) is a control for player i, 
w(-) is a standard IR' -valued Brownian motion, and a(-) 
is a continuous-time Markov chain having state space M. 
with generator Q = (q ht ) G R ra » xm ». Let {T t : < t} 
be a filtration, which might depend on controls, and which 
measures at least {(k;(s), a(s)) : s < t}. We suppose that for 
each i = 1,2, Uj(-) is JF t -adapted taking values in a compact 
subset Ui C K, which are called admissible controls. Denote 
A(x,l) = a(x,i)a'(x,i) = {a Joko (x, t)) G R l ° x R'°, which 
is symmetric and positive definite. 

Let G C M. l ° be a compact set that is the closure of its 
interior G° and r be the first exit time of x(t) from G° with 



i{t : x(t) £ G°}. 



(25) 



Using a real number (3 > to denote the discount factor, let 
the cost function be 



-0s 



k(x(s), a(s), u(s))ds 



+g{x(r),a(T)) , 



(26) 



where fc( ) and g(-) are functions representing the running 
cost and terminal cost, respectively, and denotes the 
expectation taken with the initial data x(0) — x and a(0) = i 
and given control process u(-) = ('Ui(-), Next, we 
introduce the relaxed control representation; see [6], [7]. 

Definition 4.1: Let B(£7 x [0, oo)) be the er-algebra of 
Borel subsets of U x [0, oo). An admissible relaxed control 
m(-) is a measure on B(Ux [0, oo)) such that m(U x [0, t]) = 
t for each t > 0. Given a relaxed control m(-), there is an 
mj(-) such that m(drdt) = m t (dr)dt. In fact, we can define 
mt(2J) = lim^ m{Bx[t-S,t]) for fl g 
o 

To proceed, we need the following assumptions. 
(Al) For each t S fc(-, l, •, ■) and 6(-, l, •, •) are continuous 

functions on the compact set G x U\ x C/ 2 - 
(A2) For each t 6 A4, the functions <r(-,i) and g(-,i) are 

continuous on G. 



(A3) Equation (I24t . where the controls are replaced by 
relaxed controls, has a unique weak sense solution 
(i.e., unique in the sense of in distribution) for each 
admissible triple (w(-),a(-),m(-)), where m(-) = 
(mi(-),m 2 (-)). 

(A4) For any i G M, jo,k e {1,2, ... ,l },j ^ fe , 

a joi(X X T b ) > 12k ^j \ a jok ( x > L )\- 

[ oo, if 6(t) G G° for all f < oo, 
(A5) Let f (0) = ^ 

[ inf{t : cj)(t) £ G°} otherwise. 

The function f( ) is continuous as a mapping from 

D[0, oo) to [0, oo] with probability one relative to the 

measure induced by any solution with initial condition 

(x, l), where D[Q, oo) denotes the space of functions 

that are right continuous and have left limits endowed 

with the Skorohod topology, and [0, oo] is the interval 

[0, oo) compactified (see [7, p. 259]). 

(A6) The functions &(•) and k(-) are separable in n and r 2 

for every (x,l) g G x AL That is, b(x, i,ri,r2) — 

Ya=i and fc ( x ) t ! r *i> r 2) = Yh=i ^(x, hn)- 

(A7) The cost fc(-) is convex-concave with respect to (n ,r2), 

and there exist R'° -valued continuous functions &*(cc, i) 

(i = 0, 1, 2, 3) such that b(x, i, ri,r 2 ) = rir 2 b {x, i) + 

r 1 b 1 (x, i) + r 2 b 2 (x, l) + b 3 (x, i). 
Assumption (A4) is used for construction of transition 
probabilities of the approximating Markov chain. It requires 
that the diffusion matrix be diagonally dominated. If the 
given dynamic system does not satisfy (A4), then we can 
adjust the coordinate system to satisfy assumption (A4); see 
[7, p. 110]. (A5) is a broad condition that is satisfied in 
most applications. The main purpose is to avoid the tangency 
problem discussed in [7, p. 278]. Later, we will establish 
the existence of saddle points using either (A6) or (A7) in 
addition to (A1)-(A5). Condition (A7) allows non-separable 
differential games with respect to controls. 

Now we are ready to define upper values, lower values, 
and saddle points of differential games; see [6] for the cor- 
responding definitions of systems without regime switching. 
Let Ui be collection of all admissible ordinary control with 
respect to (iw(-)> <*(•))}■ F° r A > 0, Let Ui(A) C Ui such 
that Uj(-) are piecewise constant on the intervals [fcA, kA + 
A), k = 0, 1, 2, . . ., and itj(feA) is ^A-measurable. 

Let Li(A) C Ui(A) denote the set of such piecewise con- 
stant controls for player 1 that are determined by measurable 
real-valued functions Q\ : n( ) 



u\{nA) = Qi tn (w(s),a(s),u(s),s < nA), 



(27) 



We can define L 2 (A) and the associated rule u 2 for player 
2 analogous to (I27> . 

Thus we can always suppose that if the control of 
(for example) player 1 is determined by a form such as 
(I27> . Then (in relaxed control terminology) the law of 
(w(t),a(t),m2(t)) for nA < t < (n + 1)A is determined 
recursively by past information 



{w(s), Oi(s), 7712 (s), S <t, , TOl(s), S < 77 A}. 



(28) 



Definition 4.2: For initial condition x(0) = x, a(0) = l, 
define the upper and lower values for the game as 

V + (x,i) = lim inf sup W(x, l,ui,u 2 ) 7 (29) 
A ^o uieLi(A) u 2 eu 2 

V~(x,i) = lim sup inf W(x, i, u±, u?). (30) 
A ^°« 2 eL 2 (A) Ul eUl 

If the lower and upper value are equal, then we say there 
exists a saddle point for the game, and its value is 

V + (x,i) = V~(x,l) = V(x,l), VxeG,i€M. (31) 

B. Markov Chain Approximations 

Here, we will construct a two-component Markov chain. 
The discretization of differential game leads to a sequence 
of discrete Markov games. The approximation is of finite 
difference type. The basis of the approximation is a discrete- 
time, finite-state, controlled Markov chain {(£„, ajj) : n < 
oo} whose properties are locally consistent with that of (124b . 

For each h > 0, let Gh be a finite subset of G such that 
d(Gh, G) — » as h — » 0, where d(-) is a metric defined by 

d(G, Gh) — max min d(p, q). (32) 

p£G q£G h 

Let {(CnJ a n) : n < oo} be a controlled discrete-time 
Markov chain on a discrete state space x .M with 
transition probabilities denoted by p h ((x, l), (y,£)\r), where 
r = (Vi,r 2 ) G U\ x [7 2 . We use (u^ n ,it2„) to denote 
the actual control action for the chain at discrete time n. 
Suppose we have a positive function At h (-) on Gh x M x 
Ui x U2 such that sup^. L r At h (x, 1, r) — > as /i — > 0, 
but inf x ,,, )r At (x, £, r) > for each /i > 0. We take an 
interpolation of the discrete Markov chain {(£„,£*„)} by 
using interpolation interval At£ = Ai£(f£, o£,ii£ )n , u£ )7l ). 
Now we give the definition of local consistency. 

Definition 4.3: Let {p h ((x, l), (y,£)\r)} for (x, t) and 
(y,t) in Gh x M and r € U\ x J7 2 be a collec- 
tion of well-defined transition probabilities for the two- 
component Markov chain {(§J,a„)}, approximation to 
(x(-), £*(•)). Define the difference A£„ = — £„. Assume 
lim/^o sup Xii I . At fe (x, t, r) = 0. Denote by cov££„ 
and j n the conditional expectation, covariance, and prob- 
ability given {€b a ki u i,k> u 2,k> k ^ n >£n = X ' a « = 
t, («i„, U2 J = J"}. The sequence {(£^,ajj)} is said to be 
locally consistent with d24l . for At h — At (x, t, r), if 

^ ) n^ = 6(x, t> r)At fc + (At h ), 

cov^„Ae = A(x, t )Ai' l +o(A^), 

PxlnK+i = n = <^At' 1 + o(A^), for t ± t, (33) 

^1„K +1 = 1} = (1 + ?u)At" + o(A^), 

sup |A^ l | ->0 as h -> 0, 

To approximate the cost defined in (I26> . we define a cost 
function using the Markov chain above. Let 



The cost for u h = {(u'i n , u 2 „)} and initial (x, t) is 



t" 



n-l 



At h , and N h = inf{n : ^ t G° h ). 



_N h -l 



w 



ix,L,u h ) = E x , L [ £ e~^At h n - 



71=0 



(34) 



Using (1) to denote the space of the ordinary controls 
that player i goes first, and its strategy is defined by mea- 
surable functions of the type similar to (I27> . That is, for 

u' 1 G Ui(l), u^ n is determined by 

{Ctatk < n;ul k ,u'l k ,k < n}. 

By U\ l (2) denote the collection of the ordinary controls that 
player i goes last. For u 1 } 6 U\ l (2\ u' l n is determined by 

<*£,&< n;u'l k ,k< n;u^ k ,k < n,j ± »}• 
The associated upper and lower values is defined as 



V ,1 ' + (x,l)= inf sup W h {x,L,u\,v%), 

«f6"i'(l)u|6W 2 h (2) 



V h '-{x,i) 



sup 



inf W h {x,L,u^,i4). 



(35) 
(36) 



^ew 2 h (i)«f ew i h ( 2 ) 
C. Saddle Points for the Markov Chain Approximation 

In this section, we present a local consistent discrete 
Markov game of {(£,„,&{])} generated by central finite 
difference scheme for analysis purpose. Under assumptions 
(A1)-(A5) together with either (A6) or (A7), we can apply 
Theorem 13.31 to show the existence of saddle points for 
each h. By forcing the limit h — > 0, the upper (lower) 
values converge to that of stochastic differential game by 
Lemma l4~6l and it results in the existence of saddle points. 

First, the transition probabilities for {(£ni a n)} are 

p h ((x,i),(x±e jo h,i)\r) = 

±hb JO (x, l, r) + a jojo (x, 1) - }~2k ^j \ a jok (x, l)\ 
2(D' 1 (x,l) -f3h 2 ) 

for jo = l,2,...,i , 

1/2 -a+ (x,t) 

p l ((x,L),(x + e J0 h + e ko h,L)\r) = ^h^fyT^2 ' 

1/2 • at, (x,l) 
p h ((x, t), (x - e jo h - e ko h, t)|r) - ^ 



p ((x, L),(x + e jo h-e ko h, L)\r) 



D h (x,L) - [3h 21 
for j < ^0 

1 / 2 - a 7ok ( x > L ) 



D h (x,t) - f3h 2 ' 
for j ^ ho, 



p h ({x, l), (y, £)\r) = 0, otherwise. 



(37) 



where 



Set the interpolation interval as At (x, i) = h 2 /D h (x, l). 
By (A4), D h {x,l) - (ih 2 > 0. Also, we have 
Ylty £) P h {{ x i b )i (y>£)\ r ) — 1- To ensure that p h {-) is always 
nonnegative, we require 

min 30 {a JM0 (x, t) - Eft 07 y |a.j'ofcoO> 01} 
max r \b jo {x,L,r 1 ,r 2 )\ 

Lemma 4.4: Assume (Al), (A2), (A4), and h satisfies 
(1381 . The Markov chain a„) with transition probabilities 
{P h {')} aR d interpolation At (■) defined above is locally 
consistent with (I24t . 

Proof. The criterion in J33i can be verified through a series 
of calculations, thus details are omitted. □ 

Theorem 4.5: Assume (A1)-(A5), either (A6) or (A7), 
and Gh is a finite set defined above d32i . For x G Gh and 
i G M, a Markov chain is defined by d37l . Let V h ' + (x, l) 
and V '~{x,C) be the associated upper and lower values 
defined in (1351 and (1361 in the control spaces (1) and 
Uj(2) . Then there exists a saddle point 

V h '+(a; > 0=V fc '-(a;,t) s (39) 
provided h satisfies (1381 . 

Proof. The contraction condition l|9} satisfies for the discount 
factor /3 > 0. Let 

p((x,t)(y,€)|ri,r 2 ) = i)(t/,^)|ri, r 2 ), 

c(x, t, n,r 2 ) = e- 0Ath ^At h (x, t)k(x, l, n, r 2 ). 

Assumptions (A6) and (A7) lead to (HI) and (H2), respec- 
tively. The result holds applying Theorem 13.31 □ 

Although the proof of next lemma is rather complicated 
and not trivial, the proof is referred to weak convergence 
techniques in [7], [5], and [6] due to the limit of space. 

Lemma 4.6: Assume that the conditions of Theorem 14.51 
are satisfied. Then for the approximating Markov chain, we 
have 

lim V h ' + (x,i) = V + (x,i), (40) 

h— >0 

limV h ~(x,L) =V-(x,i). (41) 

h— tO __ 

Theorem 4. 7: Assume the conditions of Theorem 14.51 are 
satisfied. Then the differential game has saddle point in the 
sense 

V + (x,t)=V-(x,i). (42) 

V. Further Remarks 

The key part of zero-sum game problems is existence of 
saddle point. This paper is devoted to sufficient condition 
for the existence of saddle point in discrete Markov game. 
Using dynamic programming equation method, we are able 
to use static game results of Sion [13] and von Neumann [9] 
to discover the sufficient conditions. A direct application is 
numerical methods for stochastic differential game problems. 

The transition probabilities used in (I37> requires restriction 
(I38i on h. Practically, we develop the transition probabilities 
by upward finite difference scheme, so that the generated 
one is well defined without restriction on h. It can be 



routinely calculated to verify the local consistency. This kind 
of discrete Markov game might have different upper and 
lower values for some h. However, both the upper and lower 
values in this situation converge to the original saddle point 
of differential game V(x) by Lemma l4~6l and Theorem 14.71 
Numerical examples in pursuit-evasion games are omitted 
due to the space limit, although the numerical results clearly 
verify our works. 

For a regime-switching system in which the Markov chain 
has a large state space, we may use the ideas of two-time- 
scale approach presented in [16] (see also [17] and references 
therein) to first reduce the complexity of the underlying 
system and then construct numerical solutions for the limit 
systems. Optimal strategies of the limit systems can be used 
for constructing strategies of the original systems leading to 
near optimality. 
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